Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning

نویسنده

  • Gerald Tesauro
چکیده

This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace. For a single adaptive agent facing xed-strategy opponents, ordinary Q-learning is guaranteed to nd the optimal policy. However , for a population of agents each trying to adapt in the presence of other adaptive agents, the problem becomes non-stationary and history dependent, and it is not known whether any global convergence will be obtained, and if so, whether such solutions will be optimal. This paper studies simultaneous Q-learning by two competing seller agents in three moderately realistic economic models. This is the simplest case in which interesting multi-agent phenomena can occur, and the state space is small enough so that lookup tables can be used to represent the Q-functions. Despite the lack of theoretical guarantees, simultaneous convergence to self-consistent optimal solutions is obtained in each model, at least for small values of the discount parameter. In some cases, such convergence is also found even at large discount parameters. Furthermore, the Q-derived policies increase prootability and damp out or eliminate cyclic price \wars" compared to simpler policies based on zero lookahed or short-term lookahead. The use of function approximators (neural nets) instead of lookup tables is also investigated; preliminary ndings indicate that reasonably good policies can be obtained even though the absolute accuracy of the function approximation may be poor.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-agent Q-learning and Regression Trees for Automated Pricing Decisions

We study the use of single-agent and multi-agent Q-learning to learn seller pricing strategies in three diierent two-seller models of agent economies, using a simple regression tree approximation scheme to represent the Q-functions. Our results are highly encouraging { regression trees match the training times and policy performance of lookup table Q-learning, while ooering signiicant advantage...

متن کامل

An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network

RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...

متن کامل

User-based Vehicle Route Guidance in Urban Networks Based on Intelligent Multi Agents Systems and the ANT-Q Algorithm

Guiding vehicles to their destination under dynamic traffic conditions is an important topic in the field of Intelligent Transportation Systems (ITS). Nowadays, many complex systems can be controlled by using multi agent systems. Adaptation with the current condition is an important feature of the agents. In this research, formulation of dynamic guidance for vehicles has been investigated based...

متن کامل

Adaptive Leader-Following and Leaderless Consensus of a Class of Nonlinear Systems Using Neural Networks

This paper deals with leader-following and leaderless consensus problems of high-order multi-input/multi-output (MIMO) multi-agent systems with unknown nonlinear dynamics in the presence of uncertain external disturbances. The agents may have different dynamics and communicate together under a directed graph. A distributed adaptive method is designed for both cases. The structures of the contro...

متن کامل

An Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources

This paper presents an online two-stage Q-learning based multi-agent (MA) controller for load frequency control (LFC) in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs). The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001